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> ! Abstract 

2] ! Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become 

'^^ ■ a crucial tool for simulating complicated biological reaction networks due to their ability to handle 

extremely complicated networks and to represent noise in small-scale chemistry. These methods 
^Jr\ can, however, become highly inefficient for stiff reaction systems, those in which different reaction 

channels operate on widely varying time scales. In this paper, we develop two methods for acceler- 
ating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up 
to any arbitrary error bound. Both methods depend on analysis of the eigenvalues of continuous 

(N 

K^ ' time Markov models that define the behavior of the SSA. We show how each can be applied to 

00 : 

^£ ' accelerate sampling within known Markov models or to sub-graphs discovered automatically during 

'NT 

^~. . execution. We demonstrate these methods for two applications of sampling in stiff SSAs that are 

^, 

^D ' important for modeling self-assembly reactions: sampling breakage times for multiply-connected 

OO . 

^^ ' bond networks and sampling assembly times for multi-subunit nucleation reactions. We show theo- 
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retically and empirically that our eigenvalue methods provide substantially reduced sampling times 
for a large class of models used in simulating self-assembly. These techniques are also likely to have 
broader use in accelerating SSA models so as to apply them to systems and parameter ranges that 
are currently computationally intractable. 



I. INTRODUCTION 

Stochastic simulation methods have become increasingly widespread as a means of sim- 
ulating and analyzing biochemical reaction kinetics^. The chemical master equation, which 
governs the reaction kinetics for well-mixed systems, forms the basis for the stochastic sim- 
ulation algorithm (SSA), proposed by Gillespie^'^. SSA models a reaction system as a Con- 
tinuous Time Markov Model (CTMM) in which states of the system are defined by counts of 
reactants present at a given point in time and transitions between states correspond to in- 
dividual reaction events. This SSA approach is valuable in part because it provides a model 
of reaction noise, which can become significant for reaction networks on cellular scales^. 
Furthermore, SSA models can provide significant computational advantages over continuum 
models for networks characterized by extremely large sets of possible reaction intermediates. 
The computational value of the SSA approach lies in the fact that for a large class of net- 
works, the random walk visits only a small fraction of the state space before equilibrium is 
established. As a result, kinetics on complicated networks can be simulated "on the fly," 
requiring explicit construction of the CTMM network only in the immediate vicinity of those 
states visited on a given trajectory. This property is an essential requirement for any fea- 
sible simulation algorithm, since the size of the state space describing the master equation 
is astronomical even for modest system sizes. Successful applications of SSA include gene 
regulatory networks* and self-assembly of complicated structures, such as virus capsids^*^. 
Furthermore, the SSA approach has now been adopted by several approaches for whole-cell 
modeling^i^ and modeling generic complex reaction networks^ii^. 

The relaxation time of the SSA can, however, be extremely sensitive to the transition 
rates controlling the reaction kinetics. A pure SSA model has difficulty with stiff reaction 
systems, i.e., those where important events occur in parallel on very different time scales. In 
such cases, a simulation can become bogged down by sampling fast events to the exclusion 
of the slow events. Hybrid discrete/stochastic model3^>^>^ can resolve this problem in some 
domains, but not when the fast reactions make use of too many intermediates to allow them 
to be modeled continuously. One important example of such a stiff reaction system is the 
breaking of bond networks, where individual bonds may break and repair repeatedly before 
a sufficiently large bond group is broken to fracture the network. Another form of stiff SSA 
network occurs near the critical concentration of a self-assembly system, where high-order 
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FIG. 1: Illustration of trapped subgraphs in SSA models, (a) A simple CTMM on a 3-path with 
transition rates a and b. (b) The probability landscape for the model. SSA is slow whenever the 
invariant density for the corresponding Markov chain is irregular. Here, the SSA takes 0{a/b) steps 
to reach vertex 3. (c) CTMM model of a trimer assembly system with three subunits. Graph of 
possible configurations joined by reaction rates. States in which the trimer is broken are surrounded 
by solid lines and others by dashed lines. 

nucleation events can be orders of magnitude slower than individual binding reactions. In 
these stiff systems, an SSA model can become "trapped" for many steps in a small subset 
of the state space, resulting in negligible simulation progress for long periods of time. 

To understand these "trapped" systems, it is useful to consider the graph theoretic rep- 
resentation of the SSA method. An SSA model is represented by a graph in which each 
node corresponds to one possible state of the full model. Edges connect nodes whose states 
can be reached from one another by a single reaction event, e.g., two molecules binding to 
one another. At each simulation step, SSA considers only the immediate neighbors of the 
current state. As a result, the simulation is prone to traps that can result from irregular- 
ities in the invariant density of the embedded Markov chain (EMC) implemented by SSA 
for a given CTMM. For example, consider a 3-state CTMM represented by a simple path 
(Fig. [U^a)), where the backward transition rate a is much larger than the forward transition 



rate b. The average number of SSA steps to reach state 3 from initial state 1 is 0{a/b) 
because once SSA visits state 2, it will jump to state 3 only a b/a fraction of the time. 
Nodes 1 and 2 collectively define a trapped subgraph from which the model must escape. In 
general, for an A^-path, where each forward rate b is smaller than the backward rate a, SSA 
takes 0{a/b)'^~'^ steps to traverse the path (see Theorem IIII.ll for an analogous problem). 
One way such a trapped subgraph can arise in a physical system is through models of the 
breakage of bond networks. Fig. [Tt^c) shows the graph arising from a model of the breakage 
of a three-cycle bond network, which behaves similarly to the 3-state CTMM by establishing 
a trapped inner graph of four states — the unbroken state and three states with a single 
broken bond — from which the model must escape to reach any broken network state. We 
can alternatively understand the trapping problem in terms of a probability landscape view 
of a reaction system. The SSA is sluggish whenever its equilibrium landscape is irregular, 
consisting of valleys and hills. The broader and deeper these are, the slower SSA becomes. 

To overcome the presence of traps or landscape irregularity, we propose two non-local 
simulation algorithms that rely on the spectral decomposition of the Kolmogorov matrix 
(for a CTMM) or the transition matrix (for the Embedded Markov Chain (EMC)). These 
eigenvalues and their associated eigenvectors describe global modes of relaxation of the full 
graph or any of its sub-graphs. Since eigenvalues are global properties of a graph, spectral 
methods are much less sensitive to local landscape traps. These methods can be applied 
to quickly sample first passage times on small CTMM graphs such as those in Fig. [1] or to 
sample escape times from trapped subgraphs when the full model is prohibitively large. 

Previous attempts at simulating rare events include the Forward Flux Sampling (FFS) 
technique of Allen et alM- and related methodsi^'^S. The approach breaks a rare event 
into a series of relatively more probable stages and uses estimates of waiting times for the 
successive stages to develop an aggregate transition rate for the full event. This aggregate 
rate can then be used to approximate the first passage time density as a single exponential 
random variable. However, while the exponential tail dominates the density for stiff systems 
and is therefore a highly accurate approximation in many cases, the true probability density 
has a peak at short times followed by a mixed exponential tail. The methods developed 
in the present work, by contrast to the FFS-like methods, sample first passage times from 
the entire density to within arbitrary an error bound. Recently, another method called the 
slow-scale SSA was proposed by Cao et a/.— »^, which relies on a technique called the Partial 



Equilibrium Approximation (PEA). PEA essentially assumes that the set of fast reactions 
are always in equilibrium and the method approximates transition rates between slowly 
varying reactant species by their expected value in the partial equilibrium state. While these 
methods can provide significant benefits for some CTMMs, there are several limitations in 
using PEA or similar approximations for arbitrary graphs. First, a clear distinction between 
fast and slow species may not be obvious in a given problem. For example, in rule-based 
simulation of bond networks, stiffness is built in through the association/dissociation rates 
of individual bonds rather than being species dependent. Secondly, these methods always 
need to be supplemented with approximations involved in computing the mean values of 
the reaction propensities. Furthermore, PEA will be inaccurate whenever fluctuations in 
the reaction propensities within the partial equilibrium state are comparable to their mean 
values. 

The goal of the present work is to develop efficient methods for some important classes of 
stiff SSA model for which the above techniques are unsuitable, with a particular emphasis on 
models important to simulations of self-assembly reactions. The methods proposed in this 
paper can be applied to Markov processes on arbitrary graphs. Furthermore, they can be 
made accurate to within arbitrary error bounds. The remainder of this paper is organized as 
follows: Section Thai sets up some basic notation and a description of the sampling problem 
for general CTMM. In Section Hi Bl we introduce a spectral method which relies on the eigen 
decomposition of the master equation describing the CTMM. We use a complete spectral 
decomposition of the first passage time density and rejection sampling to return sample 
first passage times for arbitrary CTMMs. In section HI CI we introduce another spectral 
method which works as a hybrid between the purely local SSA and the completely nonlocal 
Master Equation method. The latter method proceeds by adaptively constructing a basis 
in which to simulate the Markov chain until the system state has relaxed to its slowest 
eigenvector. If first-passage out of the trapped subgraphs does not occur by that time, 
we use the appropriate eigenvalue to sample the time to first-passage as an exponential 
random variable. In section Hi Dl we introduce a method for automated discovery of trapped 
regions in stiff Markov models. This technique allows efficient implementation of spectral 
methods for large state spaces by isolating regions repeatedly visited by a given random 
trajectory and using spectral sampling to escape any such subgraph. In section Hill we 
present theoretical results on the time complexity of SSA for bond networks followed by 



experiments on some special classes of bond networks to compare the simulation efficiency 
of each method discussed. In Section [IV] we evaluate the automated discovery variants of the 
method by applying them to models of a nucleation-limited assembly system with a state 
space too large to explicitly construct. Section |V] concludes the paper with a discussion of 
results and directions for future research. 

II. THEORY 

A. The chemical master equation and the stochastic simulation algorithm 

The SSA identifies reaction kinetics for networks of biochemical subunits as a Markov 
process governed by an appropriate Chapman- Kolmogorov equation or, equivalently, its 
differential version - the master equation. Let S = {1,2, ... ,Ns} be the state space for 
the CTMM, each node representing a possible state for the simulated system. The time 
evolution of probability densities is governed by a Kolmogorov matrix W, which specifies 
the transition rates Wnm from the state m to n. 

~jr = X] ^nmPmit) - WmnPn{t) (l) 

meS 

where, p„(t) denotes the probability to be in state n at time t. The matrix elements Wnm 
satisfy two necessary conditions: 

1- Wnm > for n ^ m. 

2- Y.m ^nm = 0. 

Under these conditions, it is well known that the matrix has a steady state solution |n) = 
Yln'^ri\n) that is an eigenvector of W with eigenvalue zero and that all initial distributions 
relax to |n) in the limit of long times^. In addition, we will require W to satisfy the 
detailed balance condition, which states that at equilibrium, the sum of probability current 
exchanged between any pair of states {n,m) is zero, i.e., Wnm^^m = WmnT^n- This in turn 
allows one to define a scalar product on the state space such that W is self-adjoint: 

1 
(n|m) = 5nm — (2) 

T^m 

This condition ensures that we can construct an orthogonal eigenbasis and compute time 
evolved versions of any given initial probability distribution using spectral decomposition. 



B. Spectral Sampling 1: Master Equation approach 

Given a Kolmogorov matrix W ona state space 5* and an arbitrary initial state i & V G S, 
the first-passage time Tp{i) is a random variable which gives the time at which the trajectory 
first reaches any state in some subset of the state space F = S — V. The standard method 
of solving a first passage problem is to set up the master equation for V with an absorbing 
boundary over F (zero Dirichlet boundary condition)—. Let Py be a projection operator 
onto the subspace V and let N be the cardinality of V. Then, M = PyWPv is the effective 
Kolmogorov matrix that governs time evolution over V . From detailed balance, M is self- 
adjoint over LF^-1 ■ Hence, the eigenvectors of M form a complete basis { iV^a) }• A consequence 
of the spectral theorem is the completeness relation for the properly normalized eigenbasis, 
i.e., {'ipal'ipp) = Sai3 ■ Given any vector \ri): 

N 

1. Spectral decomposition of the first-passage time distribution 

In terms of the vertex set basis, the completeness relation over L^„i is / = X^ney ^^^ 
where P„ = 7Tn\n){n\ is the projector onto vertex state \n). Given an initial probability 
density Pn{t = 0) = (5„j the probability for state n E V evolves as: 

N 

Pn{t)\n) = Pne'^'lt) = 7Tn\n){n\Y,{^a\i)e~^-'\^a) 

N 
=^Pn{t) = ^T^n^a,n^a,iexp[-Xat] (4) 



The transition to an element f & F outside of ^ , is governed by the following equation: 

neV 

N 

= ^c„jexp[-Aat] (5) 

The probability for a first passage to the state / between t and t + (it is hence given by 
p{Tf = t)dt = Y.'l^^c^je-^'^'dt. 



2. Exact sampling for the first-passage time distribution 

In this section we describe a method for returning a sample time from the computed first- 
passage density p{t) = ^j^j^ CjC"^** to any state f E F. A general method for sampling from 
complicated distributions is to use the method of rejection sampling, which first chooses 
a random variable from a convenient envelope density and accepts or rejects the sample 
based on a second random sample that depends on the tightness of the envelope fit. The 
rejection rate is low if the envelope curve closely approximates the given curve. A simple 
envelope curve is provided by a pure exponential of the most slowly decaying eigenvalue, 
with a coefficient equal to the sum of all positive terms Xlc >o '^* ^^ ^^^ computed density 
p{t). However, there is no guarantee that the rejected part is small. Since each eigen mode 
encloses an area q/Aj, cancellations between near-degenerate eigenvalues can in principle 
lead to a high rejection ratio. We therefore present a method for choosing an envelope curve 
g{t) which eliminates these cancellations. Furthermore, in section UlIBI we show that the 
envelope curve is exact for bond networks generated by cycle graphs Cn- We sample from 
g{t) using a decomposition into a discrete mixture of densities fa(t) = (ia(e~^"*— 6^^"+^*) and 
an efficient rejection step. Here d^ are constants, one for each component /„ of the envelope 
curve g{t). The next theorem proves that the density fa{t) can be sampled efficiently using 
a rejection method. 

Theorem II. 1. The expected rejection ratio for fa{t) is bounded from above by 1.5. 

Proof. We will use a simple exponential ha{t) = CAexp[— At] as the envelope function. In 
order to minimize C we choose ha{t) such that 

haiQ = fa{Q (6) 

and 



dha 



dfa 



dt 
where t* is defined implicitly by the condition 



t=u d^ 



dt^ 



(7) 



(8) 



t=t. 



These constraints yield a unique solution t^ = 2 "^ °'^^' "\ Since -^ > for t > t,,,, the 
slope of ln[/Q,] monotonically increases to — Aq, as t ^ oo. The corresponding envelope rate 



AlgorithmtPrepare Discrete Mixture 

Input: First passage time probability density p 
Output: Envelope g{t) = ^iSifi{t), rejection ratio R 

Sort the list {q} in increasing order of A^; 
Initialize i? ^ 0; 
fori e {l,...iV} do 

Compute the partial sums pi ^- X^^^^ c„; 

if Pi > then 

S^^ Pi* {X^+l - Ai)/(A,Ai+i); 
R < — R -\- Si', 

end 
end 
Return {Si}, R; 



Algorithni:Sample first passage time 

Input: Probability density p{t) = ^iCiC^** 
Output: Sample time T distributed according to p 

Set {Si}, R ^- Prepare Discrete Mixture(p); 

rejectl ^- 0; 

while rejectl < 1 do 

Generate a uniform [0, 1] random variate U; 

prob ^ 0; 

mix ^- 1; 

while prob < U do 

prob ^ prob + Smix/R] 
mix ^ mix + 1; 

end 

reject2 <— 0; 

Compute an envelope f^i^ < R^i^e^^™-"'^: 

oer (jmix ^ x^mix * ■^mix-\-l ) / \^mix i ^mix+lj) 
Compute Rmixi^mix, ^mix+l)] 

while reject2 < 1 do 

Generate an exponential random variate T, with mean !/§„ 
Generate a uniform random variate [0,1] X; 
rejectl < fmix{T) . 

■J exp{-gmixT)*Rniix*X ^ 

end 

Generate a uniform random variate [0,1] Y; 

rejectl ^ ,^^; 
end 
Return T; 



FIG. 2: Pseudocode for spectral method 1 
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then satisfies A = , ", r"*"^ ^ A^. The rejection ratio is given by: 
C = ^ /°^"+! exp[(A - X^%] fl ^ ^^ 
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(9) 



Afj + Aq,+i 

where x = Aq,/Ao+i G (0,1). To upper-bound C, note that the exponent increases mono- 
tonically with x and its maximum is hm2,^i_(2x^ In [x])/(l — x^) = —1. This bound finally 
gives us C < 4/e k. I AT. D 

As a final comment, we note that for general graphs the average time complexity of this 
algorithm is dominated by the computation of the eigenvectors and eigenvalues, which gives 
us the following theorem: 

Theorem II. 2. The average time complexity for spectral decomposition of the master equa- 
tion is 0{N^) for a graph of N vertices^. 

C. Spectral Sampling 2: Modified embedded Markov chain method 

The efficiency of the SSA is dependent on the relaxation time of the embedded Markov 
chain (EMC). We use this observation to modify the basis in which the EMC is simulated. 
The standard method of executing a random walk is to consider the transition between 
adjacent states, each of which is localized at a vertex of the CTMM. However, correct 
simulation only requires that these states form a basis, not that they are orthogonal. If 
we can choose a set of states which are increasingly likely to appear during the simulation 
of the Markov chain, we are unlikely to make repeated visits to the same state. In order 
to identify such a basis starting from an initial state \i), we first identify the transition 
matrix for an embedded Markov chain that correctly describes the given CTMM. Consider 
the vertex set V = {1,2, . . . , N} and the basis constructed from V, B = {\1), |2), . . . , |A^)}. 
At any given time t, let the state of the time-evolved Markov chain be {ipit)) = X]i=i "^jK)- 
Let Vt = {i E V\ipi 7^ 0} be the vertex subset populated by the current state vector. We 
construct the EMC for the subgraph induced by Vt at each step of the algorithm. Given the 
projection of the Kolmogorov matrix M over the vertex set V, choose r = max{—Mii\il)i ^ 0) 
to be the effective rate of transition to the next state and choose an exponentially distributed 
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FIG. 3: Schematic of the EMC-based spectral method. Vertices in black are the currently occupied 
nodes. Simulation advances the system state as a discrete mixture until such time as the state has 
relaxed to its slowest eigenstate \\min)- At each step, direct transitions to the absorbing vertex 
(grey) are computed according to the Kolmogorov matrix. 

random time step r with mean waiting time 1/r. Then, Lt = — (1/r) * M is the Laplacian 
governing the EMC and Qt = I ~ Lt is the effective transition matrix at that time step. 
The next state vector is chosen to be 10) = Qtlipit)). The reason for choosing this particular 
value of r is to ensure that no term in Qt becomes negative, a necessary condition for a 
transition matrix. 

Theorem II. 3. The choice of next state is consistent with the master equation governing 
the CTMM. 

Proof. Rewrite the master equation in terms of the {\'4')i \<P)} basis (where the other N — 2 
linearly independent basis vectors can be chosen arbitrarily): 

*> r(l^hl-I 



dt \ r 

= riQt-I)\^) 

= ri\<P)-m (10) 

Since there is a unique decomposition for any vector in terms of a linearly independent basis 
set, Eq. [TOl proves that starting from \ip) the next state is uniquely determined to be |0). D 

In general, the next state |0) will have a total probability P = ^2- 4>i < 1, due to possible 
transitions out of the subgraph. We check if that is the case by generating a [0, 1] random 
variable X to compare with P. If X < P, the next state is still trapped inside the subgraph 
and we normalize it as ip{t + t) = 1/P|0). \ip{t + t)) is used to generate the next state in 
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Algorithm:Next State 






Input: Current state vector ?/;) = X^j ^i «) 
Output: Next state (^) = ^i(l>i\i) and rate 


r 


for i eV do 






if v/;, > AND r < -M^ 
end 


then 




end 






Compute the next state (t>i <- 
Return {{(p) ^Ei^iK), r}; 


-E,(5.. + ;M,, 


w. 



Algorithni:Check Convergence 

Input: Next state \(j)), present state j-i/;), rate r 

Output: The first passage time t 

EigeriMode ^- Yes; 

P^O; 

for 2 e F do 

P ^ P + (/>.; 

if {(pi — ^pi\ > fipi then 
EigenMode ^- No; 

end 
end 

Generate a uniform [0, 1] random variable U; 
if EigenMode = Yes then 

A™„ ^ r * (1 - P) ; 

Return t *^ t — y^— In U; 
end 
else 

Compute the next time step t < In L'^; 

t^t + T; 

Generate a uniform [0, 1] random variable X; 

if X < P then 

{\tp),r} ^ Next State(i I (/))); 
Check Convergence(|'i/'),|(/)),r); 

end 

else 

Return f; 

end 
end 



FIG. 4: Pseudocode for Spectral method 2 
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the simulation. This sequence will continue until the state has relaxed to its slowest eigen 
vector \\min), such that M|Amm) = — Amm|Amm) to withiu a user-defined relative error e. 
Once that state is achieved, we just need one more exponentially distributed random sample 
time r with mean 1/Xmin to escape the network. 

SSA chooses a stochastic trajectory by sampling both the next neighbor and the time for 
the next step at random. The EMC method, on the other hand, evolves deterministically in 
our modified basis and only the time between transitions is stochastic. At each time step, 
transition to the absorbing boundary states is governed by the matrix elements connecting 
each of the transient states to the absorbing boundary. The advantage of such an approach 
is that it allows us to automatically compute the most slowly decaying eigenvector during 
the simulation. For completeness, we note the following result: 

Theorem II. 4. For a graph of degree bounded by d and V of cardinality N, each step of 
this algorithm takes 0{N * d) time. 

D. Automated discovery of trapped subgraphs 

As previously mentioned, stiffness in Markov model graphs results from repeated visits 
by a typical random trajectory to a small subset of vertices of the entire graph. Since the 
performance of spectral methods is sensitive to the size of the vertex set, it would prove useful 
if we could somehow identify these "trapped" subgraphs for stiff Markov models and apply 
spectral methods directly to those. In this section we present one such method, which we 
call "Automated Discovery" (AD) and which we show to be formally applicable to arbitrary 
bounded-degree graphs. 

Let there be a state space S over which a CTMM is defined and consider a subgraph 
G{V, E) with vertex set V C S* and edge set E. Starting from an initial state i G V, we are 
interested in the time Tp{i) to first passage out of V . Consider the subgraph H^ induced 
by the vertex set f/j C \/ visited by a random trajectory executing the SSA random walk 
before it escapes V and let Ni = \Ui\ be the cardinality of f/j. If Tpi is the number of steps 
a SSA random walk takes to escape V , then a Markov model will show simulation stiffness 
whenever the expected values satisfy, 

E[N,] « E[Tf,] (11) 
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since this would imply certain vertices in f/j are being visited repeatedly. AD works by 
progressively sampling larger regions of V until it identifies a subgraph Ki induced by a 
vertex set Wi <^V such that Ui <^Wi. Once K^ is identified either of the spectral methods 
can be used directly over Ki. The method will be efficient as long as \Wi\ ~ |f/j| and 
the number of steps taken to identify K^ is comparable to the computational cost of using 
spectral sampling over Ki. Formally, Hi can be exactly discovered by repeatedly enlarging 
the discovered graph to include the last vertex outside Ki visited by the trajectory. If 
spectral sampling for a graph of vertex set size A^ works in time f{N), this procedure 
would ensure that implementing spectral sampling in conjunction with automated discovery 
takes 0{Ni * f{Ni)) steps. The stiffness condition (Eqn. [TT]) would usually ensure that this 
procedure is still efficient. *B* Another method for discovering the trapped subgraph would 
be to implement the SSA random walk for a specified number of steps >S'(A^) (depending on the 
size of the vertex set A^). Since eigenvalue methods are in general 0{N^), we can implement 
SSA until 5'(A^) < C * A^^ for some constant C, to discover the trapped subgraph K and 
then use spectral sampling to escape the discovered graph. This alternative approach could 
be less efficient in some circumstances, but would guarantee that the overhead for spectral 
sampling is no more than a constant factor beyond that of the standard SSA. Fig. [5] shows 
the pseudocode for implementing AD for a given graph by this method. The algorithm 
generates a sample trajectory using SSA till such time that the trajectory spends 0{N^) 
steps within a trapped graph K of vertex set cardinality A^ = \K\. Then either of the spectral 
methods described in section IIIBI or III CI are used to sample the first passage outside K, to 
a vertex i. In general the state of the system at the time of first passage outside K will 
be a discrete probability mixture of more than one vertices. In such cases, the vertex i is 
randomly selected in accordance with the appropriate probability weight. The algorithm 
then resumes SSA execution over the enlarged graph K[J{i}.*E* Further investigation is, 
however, required to search for algorithms that may further improve the performance of AD. 
In section IIVBI we prove that for at least one important class of graphs, namely models of 
chemically reacting species, we can indeed reduce the time complexity to its optimal value 
to within a constant factor, i.e., 0{f{Ni)). 
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Algorithm: Automated Discovery 

Input: Discovered vertex set K, current state i, time elapsed t 
Output: Final state vertex i ^ V , first passage time t 

while i £ 1/ do 

size ^ \K\] 

steps ^- 0: 

if steps < c * size^ then 

Generate SSA next state i; 

Update t; 

steps <— steps + 1; 

if i ^ K then 

size <— size + 1; 
end 
end 
else 

Do Spectral Sampling over K; 
i ^- Next state outside K; 
t ^- updated time; 
K^K[J{i}; 
end 
end 
Return i.t: 



FIG. 5: Pseudocode for Automated Discovery 

III. APPLICATION WHEN THE SUBGRAPH IS KNOWN: FRACTURING 
BOND NETWORKS 

A. Stiffness in SSA for bond networks 

In order to validate the methods, we instantiate them for some specific challenging sys- 
tems. We begin by demonstrating the non-AD variants of the methods for the problem 
of sampling the time required to break a network of bonds. This problem is an example 
of a stiff SSA on a generally small graph. It is also of independent interest because of its 
importance in modeling self-assembly processes on long time scales. Given such a system, 
we are interested here in the first passage time to the subset of states corresponding to 
disconnected graphs Vb C S. Since each bond can occur in two states, intact or broken, a 
network of d bonds can be represented as a vertex on a unit hypercube in d dimensions. 
The state space generated by the bond network before it becomes disconnected will usually 
be a truncated unit hypercube. An N-cycle C^ generates the simplest non-trivial example. 
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where the absorbing boundary is placed at all points on the hypercube at distance 2 from 
the fully-connected state. Fig{T](c) illustrates this absorbing boundary for C3. Given a d- 



bond network, we will represent the /i*'' bond-breaking rate by h^ and association or binding 
rate by a^. It is convenient to represent a vertex on this hypercube by a binary d-tuple 
i = {id, . . .i^, . . .ii}, where i^ = implies that the /i*^ bond is intact (see Fig. [T](c) for the 
graph corresponding to a trimer). From here on, we will use the notation ft = {(5^^, . . . , 61^} 
for the vector describing a state of the model with only the fi^^ bond broken. For such a 
graph, the time complexity of each SSA step is 0{d). In the rest of this paper we will use 
this model of truncated hypercubes to represent bond networks. Morris and Sinclair— have 
proven that in the case of unweighted graphs, a random walk on a hypercube truncated 
by a hyperplane relaxes to equilibrium in polynomial time bounded by 0{d)^^'^~^'^ for any 
e > 0. However, as we have argued in the introduction, the mean hitting time, i.e., the 
number of random walk steps between a pair of vertices, can be extremely sensitive to the 
parameters governing the walk. We formalize this observation in the Theorem IIII.ll below, 
which bounds the expected number of SSA steps before the network is disconnected. Let 
r = Min{a^/bu\fi, u E {I, . . . d}). 

Theorem III.l. The expected number of SSA steps required to break a k-connected network 
with k > 1 and r > 1 is Q[r''^^). 

A detailed proof of the theorem is provided in the appendix. Figs. [6] and [7] provide an 
empirical demonstration of the theorem. Fig. [6] analyzes the number of steps required in 100 
trials of the SSA algorithm for simulating the breakage of a set of cycle graphs Cn ranging 
in size from three to seven. Each model was examined using ratios of forward to backward 
rate from 1 to 20 in increments of 1. Breakage times for the cycle graphs increase linearly 
with rate ratio, although they also fall monotonically with cycle size (Fig. [6]^a)). Fig. [7] 
analyzes the number of steps required to break k-connected hypercube graphs of dimensions 
k = {2, 3, 4, 5}. Fig.[7](a) also shows that the slope of a log-log plot approaches the predicted 
exponent k — 1. Fig. [6](b) and[71^b) suggest why a spectral approach might be effective — 
as the reaction rate increases, steps to first passage behave more like a geometric random 
variable (as mean — >■ 00, standard deviation -^ mean), as expected for a slowly decaying 
eigen mode of the transition matrix. More detailed explanations of the simulation protocol 
for these figures is provided in section IIIICI 
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B. Master equation for bond networks 



1. Spectral analysis for breaking the Cn network 



We illustrate the Master Equation spectral method using the cycle graph Cn as an 
example. This is a graph of N vertices and A^ edges, connected together in a loop such 
that exactly two edges need to be removed to disconnect the graph (called a separation 
pair). The state space is 5* = {0} IJu^ilA} Uu=2 Ujy<u{A + ^}- ^^ this case, the subspace 
Vb = IJ/(=2 Uiy</t{A+^} defines the absorbing boundary and the subspace Vc = S — Vb defines 
the space of transient states. We begin with the most general form for M, the projection of 
W onto the subspace Vc. 
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In what follows, we assume that all the eigenvalues of M are negative (as they must be 
over the subset of transient states since J2n ^nm < ensures any positive probability density 
decays to zero) and that the set of rates {ai} and {bj} are positive (ensured by property 1 
of W). For economy of notation, let us define kn = cin + ^„i+n ^m- Also, in what follows we 
assume that the bond indices have been labeled such that ki < k2, ■ ■ ■ ka < fca+i . . . < k^- 
In the case of a Cjy network, the Tf{i) distribution can be efficiently sampled due to certain 
properties of the eigenvalue distribution and the form of the eigenvectors. Since the sampling 
technique for a general CTMM will be an extension of this special case, it will be helpful to 
illustrate the method by investigating the spectral properties of Cn- The next few results 
establish bounds on the eigenvalues of M as a special case of the interlacing eigenvalue 
theoren>2^. 

Theorem III. 2. The A^ + 1 eigenvalues {— Aq > — Ai > . . . — Xn} of the matrix M in Eq. 
UM satisfy the following: 

1. If ki = ki+i then —ki is an eigenvalue of M. If n such diagonal elements are identical 
then the eigenvalue is {n — l)-fold degenerate. 

2. There is at least one eigenvalue of M in the interval e^ = (— /cj, — fcj+i). 

Proof. The eigenvalue condition Det\M — A/| =0 implies that the eigenvalues A are the 
zeroes of an [N + l)*'^ order polynomial: 

N N 



1=1 j^i 



(13) 



i=l 



We establish bounds on the roots by calculating the sign of /(A) over the set of points 
{-ki,...,-kN}- 
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1. Each term inside the summation sign in /(A) contains n — 1 factors of (fcj + A). Hence 
—ki is an (n — l)-fold degenerate eigenvalue. In what follows we assume that the 
remaining kj are all distinct. 

2. The sign of the function /(A) at A = —ki is (—1)*. Hence e^ encloses at least one root 
of /(A). D 

The eigenvectors of M {{ipa)} are mutually orthogonal for the set of non-degenerate 
eigenvalues. In the case of non-degenerate eigenvalues (— Aq, 7^ —km), these eigenvectors are: 

^a,n = {n\^a) = N ""^ (14) 

i^afl = ma) = N^ (15) 

where A''^ is a normalization constant. For degenerate eigenvalues, an orthogonal basis 
can always be chosen using the Gram-Schmidt procedure. As will become apparent later, 
however, these eigenvalues do not contribute to the sampling in the case of a first-passage 
problem beginning with the unbroken loop i = 0. 

Theorem III. 3. The envelope curve g{t) defined by our method is identical to the first 
passage density for a C^ network. 

Proof. Beginning with an unbroken state at t = 0, the probability the model occupies a 
given state n at time t is given by: 

N N 

Pn{t) = 7r„ ^ ^a,n^afi exp[-\at] = ^ Ca,n expf-A^t] (16) 

a=0 a=0 

where i/ja,i is an eigenvector of M. with eigenvalue — A^. Note that only those Aq, 7^ ki 
contribute, for otherwise ipafi = 0. Assuming A^ < \a+i, the coefficients satisfy Ca,n < 
for a > n. Since the partial sum SN,n = Zla=o Ca,n = J2a=o^n'ipa,n'ipa,o = vr„(n|0) = , 
all other partial sums satisfy Sp^n = X]a=o '^a.™ — ^- These observations provide a means 
of decomposing the probability density into the following discrete mixture with positive 
coefficients: 

N 
Pn{t) = ^ Sa,niexp[-\at] - exp[-Aa+it]) (17) 

a=l 

Since 6„ > 0, the combined rate of decay to any one of the broken states is given by: 

dp^ _ >r^ dp(n,m) 



j2^-^=j:sja{t) (18) 



dt ^-^ dt 

{n,m) a=0 
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where, 



m n^m 

and 






Ut) = ■ "^! (exp[-A^t] - exp[-A„+it]) (20) 

Aa+l — Aq 

D 



C. Simulation models used for bond networks 

Although our methods can in principle sample escape times from any subnetwork of a 
CTMM graph, we have validated them here for the specific case of breaking networks of 
bonds due to the importance of this problem for self-assembly modeling. In rule-based 
models of self-assembly, a simulation is initialized with a set of assembly subunits, each 
with a complement of pre-specified binding sites. As the simulation progresses, the system 
evolves into a state with an assembly of disjoint networks. The binding interactions between 
two disconnected pieces of the network usually occurs on a slower scale than individual 
bond breaking reactions^. For bi-connected networks, however, the association rate within 
a connected network is much larger than the bond breaking rate since there is no entropy 
penalty in associating bonds between constituent subunits. Such models allow for a natural 
partitioning of the state space into subgraphs corresponding to the bi-connected components 
of the entire network. The first set of experiments that we performed were on such bi- 
connected networks. The simplest non-trivial example of a bi-connected bond network is 
the graph generated by an A^-cycle (Cat). More complicated networks of N bonds can 
be viewed as special cases of a truncated unit hypercube in A^ dimensions. We therefore 
carried out simulations for the network generated by Cn as well as the full hypercube [Z^). 
Theorem IIII. II guarantees that the expected number of SSA steps for a /c-connected network 
of d bonds is P{d,k)Q{r^~'^), where P{d,k) is some combinatorial function dependent on 
the topology of the network. 

Each model is parameterized by a rate of bond formation, a, and a rate of bond break- 
ing, b. These values were varied in different simulations. Each of the bonds had different 
binding/breaking rates but the ratio was maintained at the same order of magnitude for 
each simulation. Specifically, for a d bond network b^ = 6(1.0 + 0.05fi/d) and a^ = a. These 
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slight variations in rates from bond to bond were used to avoid giving our methods an unfair 
advantage, as they will generally be more efficient when the transition matrix has degenerate 
eigenvalues. 

D. Experiments 

We conducted a series of simulations to determine the performance of the SSA, Master 
Equation, and EMC methods for bond network first-passage times. All simulations were 
implemented in Mathematica. Run time simulations were executed on a Macintosh machine 
with a 1.8GHz G5 processor and 512 MB RAM. For the EMC based spectral method, we 
allowed each component of the state vector to converge within a relative error of e = 0.01. 
Each data point reported was the average over 500 simulations except for run time data, 
which were averaged over 100 simulations. 

We first examined the efficiency of the Master Equation method by assessing the number 
of rejection steps needed to sample each first-passage time. We carried out simulations for 
cycle graphs (Cat) varying the cycle length from 3 to 7 and the rate ratio a/h from 1 to 20 
in increments of 1. These experiments were then repeated for unit hypercubes {Z^) with 
dimension varied from 2 to 5 and rate ratio a/h from 1 to 10 in increments of 1. For each 
condition, we recorded the number of rejection steps required for each of 500 simulations 
and computed the mean and standard deviation across the 500 trials. 

We next examined the number of steps required by the EMC method for sampling times 
to network breakage. We examined the same models as those used to validate the Master 
Equation method: cycles of length 3 to 7 with rate ratios from 1 to 20 in increments of 1 and 
hypercubes of dimension 2 to 5 with rate ratios from 1 to 10 in increments of 1. We similarly 
recorded the number of EMC steps required for each of 500 simulations and computed the 
mean and standard deviation across the 500 trials. We also computed the fraction of models 
that reached the first passage time before relaxing to the slowest decay mode. 

We next tested the total run time of each of the three methods on a broader set of 
parameter ranges. We evaluated run times for each method for cycle networks of sizes 3 
through 7. We performed two sets of evaluations for each. The first set varied the rate 
ratio a/h from 500 to 5000 in increments of 500 to provide a broad view of the relative 
run times of the three methods. These numbers span ranges of values likely for protein 
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assemblies. For example, Zlotnick et al."^^ have estimated a binding free energy of AG* = 4.2 
kcal/mole for ODE based simulation of the kinetics of the Hepatitis B virus, which yields 
a/h = exp {AG/ RT) ~ 1200. We then examined ratios of SSA to Master Equation and SSA 
to EMC run times for each data point based on averages over 100 simulations per parameter 
set. In a second set of experiments, designed to give a finer view of where each method is 
dominant in parameter space, we varied the rate ratio a/h from 30 to 300 in increments of 
30. We then identified the most efficient of the three methods for each point, again using 
averaged run times over 100 trials per data point. 

We then performed analogous experiments for hypercube graphs in order to test per- 
formance on networks with higher connectivity. For each graph Z2 to Z5, we carried out 
simulations for rate ratio a/h from 3 to 30 in increments of 3. We were limited to small 
ratios because the SSA method becomes prohibitively costly for high- connectivity networks 
at higher ratios. Each simulation was repeated 100 times to yield average run times for each 
parameter set and for each of the three methods. For each parameter set, we computed the 
ratio of run times for SSA versus Master Equation and SSA versus EMC. We further evalu- 
ated which of the three methods produced the shortest average run time for each parameter 
set. 

E. Results 

We first present results on the efficiency of the rejection sampling scheme for the Master 
Equation method. The expected run time of the method is proportional to the expected 
number of trials needed to produce a successful sample. A low number of steps is therefore 
preferable, with a value of one being ideal. Fig. [8]^a) shows the rejection ratio for cycle 
graphs C3 through C7. The mean number of rejection steps is consistently below 1.5, as 
expected from theorem IIII.3I and III.ll The number of rejection steps drops with increasing 
rate ratio but increases with increasing cycle length. These results together establish the 
efficiency of the method. Fig. [8]^b) shows that the method is also robust, with standard 
deviation consistently below 0.9 for the experiments shown here. The standard deviation 
also decreases with increasing rate ratio but increases with cycle size. 

Fig. W^a) shows mean numbers of rejection steps for hypercube graphs. Since the envelope 
curve for hypercubes is not exact, these experiments provide information about how well 
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FIG. 8: Number of Rejection steps for the Master Equation method until first passage for the 
network generated by Cjy (a) Average number of steps (s) (b) Standard deviation ^y{6s'^) 
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FIG. 9: Number of Rejection steps for the Master Equation method until first passage for Z^v (a) 
Average number of steps (s) (b) Standard deviation ^/{5s'^) 

the method performs for more general networks. The hypercube graphs also yield mean 
numbers of rejection steps consistently below 1.5. The number of steps generally falls with 
increasing rate ratio. Fig. Mjo) shows the method also to be robust for hypercube graphs, 
with standard deviations consistently below 1.0 and following similar trends to the means. 
Next, we performed identical experiments to study the performance of the EMC-based 
spectral method. Fig. [TOT a) shows mean numbers of EMC steps for cycle graphs. The 
number of steps remains consistently below 6. The values rise sharply at the lowest rate 
ratios, but quickly level off to approximately 4-5, depending on the cycle length. Figs. [10] (b) 
and (c) provide the explanation for this feature. For small rate ratio, multiple eigen modes 
are responsible for the decay (see part (c)), which corresponds to increasing EMC steps before 
first passage, similar to SSA. However, as rate ratio increases further, relaxation time to the 
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FIG. 10: Number of EMC steps until first passage for the network generated by Cn (a) Average 
number of steps (s) (b) Standard deviation ^/{6s'^) (c) Fraction of times the trajectory escapes 
before relaxing to the slowest decay mode. 

slowest eigen mode becomes smaller than the average first passage time and the method 
automatically samples breaking times according to the slowest eigen mode (Fig fTW c)). This 
feature is evident in part (b) of the figures, which measure the standard deviation. At high 
rate ratio the "trajectory" is almost deterministic, i.e., it always takes the same number of 
steps to break the network. This happens because the state almost always relaxes to the 
slowest eigen mode before escaping the subgraph, hence giving a low value for a at high rate 
ratio. 

Fig. [11] shows comparable results for hypercube graphs. Fig. [TlTa) shows that mean 
numbers of steps drop substantially between ratios 1 and 2 but quickly level off to an 
apparent constant for each graph. The number of steps increases with increasing hypercube 
dimension. Figs. [TlT b) and (c) again show that the method has high variability for low rate 
ratios, where multiple eigen modes contribute significantly to the time distribution and the 
method must behave similarly to the standard SSA. At higher ratios, though, the slowest 
mode quickly dominates and the number of steps required becomes highly reproducible. 

We next examined total run times of the three methods, beginning with the cycle graphs 
C3 to Cj. Fig. [12] plots results of the EMC and Master Equation methods relative to the 
basic SSA. Fig. [12(a) shows ratios of run times for standard SSA to the Master Equation 
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FIG. 11: Number of EMC steps until first passage on Z^ (a) Average number of steps {s) (b) 
Standard deviation ^y{^s'^){c) Fraction of times the trajectory escapes before relaxing to the slowest 
decay mode. 

method. The ratio grows rapidly with increasing rate ratio, although it falls with increasing 
cycle size. Fig. [T2](b) shows the comparison of SSA to the EMC method. The SSA:EMC 
ratio likewise peaks for large rate ratios and small cycle sizes. The EMC method appears 
generally superior to the Master Equation method, beginning to dominate at a lower rate 
ratio and reaching a higher peak. Fig. lT2\ c) shows for a narrower rate range where each 
of the three methods dominates. The EMC method is the fastest for most of the range 
examined, with the standard SSA superior at the extreme of low ratios and large cycle sizes. 
We then examined run times on the hypercube graphs Z2 to Z^. Fig. [TST a) shows run 
time ratios for SSA versus the Master Equation method and Fig. [T51(b) for SSA versus 
the EMC method. Both spectral methods show sizable improvements over the pure SSA 
method for larger rate ratios and higher hypercube dimensions. SSA appears much more 
sensitive to rate ratio as compared to the spectral methods. Even for a rate ratio of 30, the 
spectral methods were more than three orders of magnitude more efficient than SSA for Z5. 
For hypercubes, unlike cycle graphs, the Master Equation method appears generally more 
efficient than the EMC-based method, even for small rate ratios. FigJT3](c) shows where each 
method is dominant. The Master Equation method is dominant for most of the parameter 
range examined, with the SSA method superior at the limit of lowest degree and smallest 
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FIG. 12: Comparative run times for the network generated by Cn (a) Ratio of SSA to Master 
Equation run times (b) Ratio of SSA to EMC run times (c) Region in 2D parameter space where 
each method is optimal 

rate ratios and the EMC dominant for low degree and higher rate ratios. This result is 
expected from Fig. [7|(a), since the average number of steps seems to increase monotonically 
with the connectivity of the graph for the EMC-based method. The efficiency of the Master 
Equation method, on the other hand, depends primarily upon the size of the complete graph, 
since matrix diagonalization is the eventual efficiency bottleneck. 
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IV. APPLICATION WITH AUTOMATED DISCOVERY (AD): NUCLEATION- 
LIMITED ASSEMBLY 

In this section, we apply the AD variants of the methods to a different system type 
also motivated by self-assembly modeling. The rate of a self-assembly processes is often 
limited by the time required to build the first stable multi-subunit complex, called a nucleus, 
which then acts as a seed for assembly of the rest of a larger structure. Because partially 
formed nuclei are unstable, considerable trial-and-error may be needed before one reaches 
completion. The time to complete a single nucleus can thus be orders of magnitude longer 
than the inter-subunit binding rate. These nucleation-limited assembly systems are one 
example of the broader class of stiff models for chemically reacting species. The state 
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space of any such a system can be represented as a lattice corresponding to the populations 
individual species. These models are similar to those treated in earlier studies of accelerated 
SSA methods^ii^. We apply one such model, representing the formation of simple trimeric 
nuclei, to demonstrate and evaluate the AD variants of the spectral methods. 

A. Integer lattice models 

The second model we consider is again an assembly of bond networks where monomers 
{m) with two identical binding sites combine to form dimers {d) and trimers (t). In order to 
show stiffness with respect to a single parameter, the trimers were assumed to be completely 
stable. If the total number of monomer subunits is A^, the state space is the intersection of 
the plane Nm + Nd + Nt = N with the positive octant of the 3 dimensional lattice formed 
by integer counts of the monomer (Nm), dimer (A^d) and trimer (Nt) populations. Let us 
represent each vertex of this graph by the pair {Nt, Nd). The reaction propensities a^v* V ^° 
reach the vertex [Nj-, N'^) from (A^, A"^) are (to within an overall constant) 

(21) 
(22) 
(23) 
(24) 

where 1/v is an entropy penalty due to the finite volume of the system. We initialize the 
system at the state (0, 0) for a given monomer count A^ and sample the first passage time 
until the trimer count reaches a given value. This system will show stiffness if the parameter 
p = N/v is small. For small p, which corresponds to low concentration and/or small binding 
energy, trimer formation will be much slower than dimer breaking/binding reactions. 

B. Automated Discovery for integer lattice 

Efficient simulation over an integer lattice, where one pair of species react on a much 
faster timescale than the others requires a partitioning of the entire lattice into subgraphs 
with fixed trimer count (since trimer formation occurs on a much longer timescale than 
monomer-dimer reactions). These subgraphs are simple paths with vertex set V{Nt) = 
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Algorithni:Automated Discovery 








Input: Discovered vertex set K, current state i 


, time elapsed t 


Output: Final state vertex i ^ V , 


first 


passage 


time t 


Do Spectral Sampling over A'; 
i ^- Next state outside A; 








t ^- updated time; 
if i e F then 

A ^- Enlarge Grapli(A, «); 

Automated Discovery(A, i,t); 
end 








else 

Return i,t; 








end 









Algorithm:Enlarge Graph 






Input: Discovered vertex set A = 


= |0,1,. 


. . A2}, current state i 


Output: Enlarged vertex set K 






N2 <- l + [m* A2]; 






Return A ^ {0, 1, . . . A2}; 







FIG. 14: Pseudocode for Automated Discovery for a simple path 



{(Ai,0), (Ai, 1), . . . (At, [(A — 3Ai)/2])}, where At represents the fixed trimer count and 
square brackets represent the largest integer smaller than the enclosed expression. Fig. [H] 
presents a procedure for implementing automated discovery on such graphs which works 
in optimal time, to within a small constant factor (the vertices are represented by dimer 
count for simplicity). At each step of automated discovery the method enlarges the graph 
by a factor proportional to its present length. The scale factor m can be optimized for any 
given sampling algorithm to optimize run time. For example, if a given spectral sampling 
algorithm works in time /(A) = A", where A is the cardinality of the vertex set; total 
number of steps used in automated discovery of a graph sized Aj can be bounded from 
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above by the following quantity S'a?., 

Ana- \]VA 



l+log^[iVi] ^Ta^2a 



Sm, = J] m"" < 



m" — i 

n=l 

< CN^ for m = 2i/("+^) (25) 

where, C = 2^"/"+^/ (2"'"+^ ~ !)• In general the Master Equation based method works in 
0{N^), however for simple paths the Kolmogorov matrix is sparse and effective power is 
expected to be more like a = 2. Since SNi is more sensitive to deviations for smaller values 
of m, we chose m = 1.3 > 2^^^ in the experiments reported here. 

The method reported here can in principle be generalized for arbitrary lattice graphs in 
d dimensions. The size of the discovered graph in such cases would overestimate the actual 
trapped graph by a factor of m'^, for a scaling factor m. For small dimensions, this may 
still be more efficient than the method discussed in section IIIDI which exactly samples the 
trapped subgraph. 

C. Experiments 

We performed two sets of experiments to compare the performance of spectral methods 
with SSA. The first set of experiments compared the Master Equation method implemented 
in conjunction with Automated Discovery for the trimer model with SSA. Each experiment 
compared the ratio of run times for sampling first passage times to reach a trimer count 
Nf = 100, starting from an initial monomer count N. The state space was partitioned into 
subgraphs corresponding to fixed trimer counts and AD was used to identify the trapped 
regions for spectral decomposition. We then performed a total of 50 comparative run time 
simulations varying A^ from 1000 to 9000 in steps of 2000 and varying p from 10~^ to 
1.9 X 10~^ in steps of 2.0 x 10^^. All run times were averaged over 50 samples. The 
scale factor for AD was set at 1.3. The second set of experiments compared the run time 
ratio for the EMC based method and SSA for first passage time to reach a trimer count 
Nt = 100, starting from an initial monomer count N. The state space was again partitioned 
into subgraphs of fixed trimer counts. AD was not required for these simulations since the 
method automatically selects the trapped region of the subgraph according to the evolving 
probability distribution. We then performed a total of 25 comparative run time simulations 
varying A^ from 1000 to 9000 in steps of 2000 and varying p from 10~^ to 0.9 x 10^^ in steps 
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FIG. 15: (a) Comparative run times for first passage for 10 trimer counts. Ratio of SSA to ME- AD 
run times (b) Comparative run times for first passage for 100 trimer counts. Ratio of SSA to 
EMC- AD run times 

of 2.0 X 10~^. All run times were averaged over 50 samples. For the spectral method, each 
component of the slowest eigenvector was allowed to relax to within a relative error of 0.01 
and an absolute error of 1.0 x 10^^. 

D. Results 

We first present results for the Master Equation method run times as we vary the stiffness 
parameter p. Fig [T5](a) shows the behavior for 5 different initial monomer counts N . Small 
p values correspond to a stiff model, since the average dimer count varies approximately 
as Nd ~ pNm and the ratio of the rate of dimer formation to trimer formation varies as 
~ Nm/Nd- Fig. [TSlfa) demonstrates the efficiency of the Master Equation method. The 
method shows large gains in the domain of small p and small A^, with relative performance 
dropping rapidly with increasing p and more slowly with increasing N. Next we present 
results for the ratio of SSA/EMC method run times as we vary p and A^. Fig. [TST b) shows 
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the behavior for 5 different initial monomer counts A^. The EMC method is effective at 
substantially larger values of p than is the Master Equation method. As is to be expected, 
the spectral methods do not scale as well as the usual SSA method with increasing N . Even 
for relatively large networks, though, the performance gain obtained by spectral sampling is 
appreciable. The reason for this is that for most cases the slowest eigen mode is reasonably 
well approximated by a vector populating only a small fraction of the subgraph vertices. As 
a result we can look at the EMC method as a generalization of other accelerated sampling 
schemes which only use one vertex, the mean value of the slowest decay mode, as in the 
PEA based methods. 

V. DISCUSSION 

We have investigated the problem of efficiently simulating stochastic reaction models and 
introduced two methods for accelerating sampling on problems characterized by multiple 
time scales. Both methods are based on spectral analysis of CTMMs equivalent to the 
SSA model. We have applied these methods in the present work to two special cases of 
these models that are important to simulations of molecular self-assembly: sampling times 
to break multiply-connected bond networks and simulating growth in nucleation-limited 
assembly systems. Collectively, these two applications demonstrate the use of the proposed 
spectral methods on small CTMM graphs known a priori and on automatically discovered 
subgraphs of large CTMMs. We have shown theoretically and empirically that the new 
methods are substantially more robust to variations in the ratios of reaction rates than is 
the basic SSA method for these problems. 

While we have applied these methods here to models used in self-assembly simulations, 
the basic methods can be expected to have much broader application. Both methods can be 
applied to sample first passage times for any arbitrary subset of states of any SSA CTMM 
graph. Both can also be applied to sample escape times from any subgraph of such a 
graph, using automated discovery to identify "trapped" regions of the CTMM graph. The 
latter distinction is important because CTMM graphs for complicated biological systems are 
generally far too large to represent explicitly. These spectral methods might be extended 
to incorporate "on the fiy" graph construction techniques, like those used by rule-based 
methods widely used for SSA simulationa^*^. The EMC method, especially, would seem to 
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be a candidate for such an extension. For example, if at each iteration, instead of adding 
all the possible next neighbors to the system state, we add only a subset of them depending 
upon their transition probabilities then we will get a natural, non-local generalization of the 
SSA. Such an approach could provide a precise and general method for pruning full SSA 
graphs to achieve more efficient pathway sampling in extremely large state spaces. 
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APPENDIX A 

In this appendix, we prove Thm. IIII.ll which helps us establish the relative efficiency 
of the spectral methods to the standard SSA. Before we prove the theorem, we need to 
establish some preliminary results. Let r = Min{a^/hy\^, z/ G {1, . . . c?}). To construct the 
transition matrix Q, SSA identifies the negative of the diagonal element of the Kolmogorov 
matrix —Wn^n = ( Xl/3 '^/3'^/3 + (1 " np)bi3 ) as the inverse of the mean waiting time at each 
SSA step and the matrix L^^ = —Wm,n/Wn^n as the graph Laplacian. Therefore, Qm,n = 
Sm,n — Ljn,n- Siucc SSA simulatcs a periodic Markov chain Q, the graph is bipartite and 
the two step chain Q^ is reducible into Ql^en ® Qldd- Here, Ql^^n is the projection of Q^ 
over the subspace of states with an even number of bonds broken Veven and Ql^ is the 
projection over Vodd = S — Veven- Since both Ql^en and Ql^^ are irreducible and aperiodic, 
the ergodic theorem applies to each one separately and if |n) = J2ies ^*l^) ^^ ^^^ eigenvector 
of Q with eigenvalue 1, the vectors llle) = J2iev ^■^1*) ^^^ l^o) = J2iev ^*l^) ^^^ ^^^ 
equilibrium distributions for Ql^en and Ql^d^ respectively, up to a normalization constant. 
To bound the mean hitting time T50, from to the set Vb, we first apply the common 
technique of constructing another graph with vertex set V = V^|J{&}, where all vertices in 
Vb are truncated to a single vertex b and Vc = S — Vb- The edge weights for edges from 
i E Veto b are chosen as Qb,i = J2j(^v Qj^^ which will leave T^q unchanged from that of the 
original graph. We must further specify the edge weights from b to any states with k — 1 
broken bonds. In order to ensure that the Markov chain still obeys detailed balance, we 
require that Qi,b/Qj,b = {Qb,i * T^i)/{Qb,j * t^j) and 'YliiJ.bQi^b = 1- The resulting modified 
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graph will then have the same hitting time Tbo as the original graph. 

We next need three auxiliary results about properties of the resulting graph in order to 
prove our main theorem. 

Lemma A.l. For an ergodic Markov chain, the cover time Cij = Tij + Tji between any two 
states i and j satisfies^ 

E[Cij] = E[T,j] + E[Tj,] = l/injPr[Tjj > T,j]) (Al) 

Lemma A. 2. The transition matrix Q satisfies the following conditions: 

1. Ifi and j = i+ft are two neighboring states with n andn+1 bonds broken, respectively, 
then Qj^i < {n* r)^^ for any n > 0. 

2. For any initial state i containing n broken bonds, the n-step transition probability to 
is bounded from below by Qq ^ > (1 + d/r)^"-. 

3. Let T be any stopping time for the transition matrix Q with expectation value E[T] = 
^,^^-i^nPr[T = n] = "^^^q Pr[T > n]. For any integer I > 1 consider the expectation 
value ofT for the I -step transition matrix Q^ defined as E^''^ [T] = ^,^ nPr[(n — 1) * / < 
T <n*l] = E„=o^^[^ >n*l]. Then, 1{E^^^[T] - 1) < E[T] < lE^^^T]. 

Proof. 1. The transition probability corresponding to the matrix element connecting i 
to j is : 






b_, ^ I (n*r)-i yi^O 

(E/3 V/3 + (1 - ^I3)h) ~ 1 1 if i = 



2. First consider any state fi with one bond broken: 

Qo.A = ^^ , > (1 + d/r)-' (A2) 

Assume Qgyn r^. > (1 + d/ry^ for all n broken bond states Yl^=i f'-i^ then 

ue{fj.i,...,i-t„+i} 

> (i+d/rr^ Yl 



> (l + t//r)-"-^ (A3) 

Since Qo.A > (1 + d/r)^^, the assertion holds for all n > 1 by induction. 
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3. We can prove the upper bound as follows: 



E[T] = ^\^{{n-l)l + m)Pr[T = {{n - 1)1 + m)] j 

n=l \m=l / 

< ^n*/ [ ^Pr[T= ((n-l)/ + m)] 

n=l \m=l 

< l^nPr[{n-l)l <T <n*l] 

n=l 



We can similarly prove the lower bound: 

E[T] = '^\^{{n-l)l + m)Pr[T = ((n - 1)/ + m)] 

n=l \m=l 

n=l \m=l / 



(A4) 



> l^nPr[{n-l)l <T <n*l]-l 

n=l 

> /(^(')[T]-1) (A5) 

D 

Lemma A. 3. The expected hitting time from the vertex b to is bounded by 

k < E[Tob] < HI + djrf (A6) 

Proof. The lower bound is trivial since at least k bonds must be repaired before any discon- 
nected state can reach 0. Consider the n* k step probability for transition from b to 0. Let 
Q be the transition matrix restricted to the set V = V" — {0}, i.e., 

Qi,j = Qi,j (1 - Soj - 5io + Soj5io) (A7) 

The probability of a trajectory starting aX i E V reaching in k steps or less is given by 

Pr[Toi <k] = 1-Y, Qli = E I E Q^M' 

jev "=i \iev 

> Ql,>il + d/rr' (AJ 
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where we have used Lemma [A. 21 part (2). Let us define p = (1 + d/r) ^. In terms of p, the 
previous inequahty and Lemma [A. 21 part (3) imply 

Pr[Tob >n*k] = 1 - Pr[Tob < n * k] < {1 - p)"" 

oo 
n=\ 

=^^M < k{\^dlrf (A9) 

D 
An immediate consequence of the previous lemma is that for k even 

A;/2 < ^(2) [To,] < (A;/2) (1 + djrf + 1 (AlO) 

Similarly, if k is odd, using the fact that PtXTq^ < T^b] = we get 

^ < i^^'H^^Ab] < E[Tot]/2 + 1 < ^(1 + d/r)' + 1 (All) 

Let us define the equilibrium probability for 6 as vf^, then 

TTb = :^^^ it k IS even 

^'' if k is odd (A12) 



We can finally compute upper (U) and lower (L) bounds on the hitting time T^o which 
are asymptotically equivalent in the limit r —>■ oo. The following theorem implies that 
A(r) = f/(r) — L(r) is monotonically decreasing in r and linir-^oo A(r)/L(r) = O^^. 

Theorem A.l. The expected number of SSA steps before first passage on a k-connected 
graph is bounded within 

^(l-d/r)--((t-l)/2)f. ^ ^i^^j ^ ^l-(V2(l+d/.r)'= + 2)#. ^^^^^ 

Proof. In order to apply lemma IATTI to bound the hitting time we need to look at graphs with 
k odd or even separately. If k is even we can apply lemma lA.ll directly to Cob for the 2-step 
chain Qe?;en- Howcvcr, if k is odd, we need to consider the cover time between b and each 
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state /t with exactly one broken bond. Then, using the fact Q\0) = (l/X^i/^^^) XIu^mI/*)' 
we get: 

E[no] = 1 + ^ Yl ^/^^M (A14) 

Since Pr[Tbb > Tob] = '^n>mP^[^bb = n]Pr[Tob = rn], for the fc-step chain discussed in 
lemma IA.3I we get: 

d/r 



CXD X 

Pr[Tbb<Tob] < El^TT 

n=l ^ ^ 



d/r) 
=^Pr[Tbb>Tob] > I -d/r (A15) 

Also, Pr[Tbb > Tp^b] > Pr[Tbb > ^ob] > 1 — d/r. Suppose k is even. Then we can estimate 
the cover time Cob = Tbo + 7o6 using lemma lA.ll 

E[Tbo] > 2 * E(2)[r,o] -2 = 2* f . p .^\^ . - i5;(^)[To5] - l) 

\nbPr[Tbb > Tob\ J 

< 2 * E(2) [Tbo] = 2 * f , p .^^ ^ ^ . - i?(^) m) (A16) 

An analogous argument for odd k on using Eq. IA14I gives. 

(1 

< 1 + ^ E ^M f - p r^\^ 1 - 2 * i5(^) [T^d) (A17) 

Finally, using lemma \KA\ and lA.31 we get for all k, 

^ ^ 1 - (k/2{l + d/rf + 2) ^b 

u 

As a corollary to the preceding theorem we get the result stated in section IIIII 
Theorem nil. lI Tlie expected number of SSA steps required to break a /c-connected network 
with k > 1 and r > 1 is Vt{r^~^). 

Proof. Let i and j = i+fi+iy be two graphs with c and c+2 bonds broken respectively. Since 
we are interested in computing the invariant distribution for the irreducible components 
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Qeiien ^^^ Qlddi ^^ ^^^^ compute each matrix element connecting i to j: 

^ii« ^ / J '^ j ,i+p^ i+p,i 



hahv 



(Z)a««(^a + ^PC') + (1 - «a - 5pa)&a) (X)/3 0/3^/3 + (l - ^p)bp) 



M^l ( \ + I ) (A19) 



similarly, 



p=li,iy 



\ *> / (^fl^U 



= ^^ f ^ + ^ ") (A20) 

Detailed balance then implies that 

^i _ ^i,* _ ^A* ^^ ^j,3 



vTi Qi .• a^ ay Wi, 



&M ^i' f, , (^A' + ^!^) ^ (^A* + ^'^) 



< £±^ * r-2 if c 7^ (A21) 



Since tta = ttq n .'U — ^^V— < ttq we can deduce that for any state i with c bonds broken, 
with A; — 1 > c > 1, the invariant probability ttj < c* r^'^+^vro. Let, / be the state with k — 1 
bonds broken for which tt is maximized. The choice of matrix elements imposed by detailed 
balance implies Qi^b > ^/it-i)- Also, since lemma [Al2] implies Xlu^A — ^o(l + d/r) we get 
for all values of k: 

Tit Qb,i ^ (rf-A; + l)(ti) 



TTj Qi,b (k - \)r 

=^nb<—<{d-k + l)(ti)r-'=+i (A22) 

TTO 

Finally, using the lower bound on -E[T5o] computed in preceding theorem we get 

1- {k/2{l + d/r)'' + 2)nb 



E[no] > 2- 






> P{d, k) * r"-' V r > ro (A23) 

where P{d, k) = j;rk^j^T^ (l - (1 + (A: * 2'^-')-')'-^^^^^^^^ and r^ = d. D 
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